Part A

1.A. Refer above table and find the joint probability of the people who planned to purchase and actually placed an order.

1.B find the joint probability of the people who planned to purchase and actually placed an order, given that people planned to purchase

2. Problem Statement

An electrical manufacturing company conducts quality checks at specified periods on the products it manufactures. Historically, the failure rate for the manufactured item is 5%. Suppose a random sample of 10 manufactured items is selected. Answer the following questions.

Decalaring the sample size n = 10
p, probabilty of success(failure rate) = 0.05

  1. Probability that none of the items are defective?
  1. Probability that exactly one of the items is defective?
  1. Probability that two or fewer of the items are defective?
  1. Probability that three or more of the items are defective ?

3. Problem Statement

A car salesman sells on an average 3 cars per week.

3A. What is Probability that in a given week he will sell some cars?

Here sell some cars means 1 or more than 1 cars

3.B. What is Probability that in a given week he will sell 2 or more but less than 5 cars?

3.C. Plot the poisson distribution function for cumulative probability of cars sold per-week vs number of cars sold per week?

4. Problem Statement

Accuracy in understanding orders for a speech based bot at a restaurant is important for the Company X which has designed, marketed and launched the product for a contactless delivery due to the COVID-19 pandemic. Recognition accuracy that measures the percentage of orders that are taken correctly is 86.8%. Suppose that you place an order with the bot and two friends of yours independently place orders with the same bot.
Answer the following questions.

Let P_A be probabilty of correctly placed orders = 0.868
Let P_B be probabilty of not correctly placed orders = 1 - P_A

4.A. What is the probability that all three orders will be recognised correctly?


4.B. What is the probability that none of the three orders will be recognised correctly?


4.C. What is the probability that at least two of the three orders will be recognised correctly? [

5. Problem Statement

Explain 1 real life industry scenario (other than the ones mentioned above) where you can use the concepts learnt in this module of Applied Statistics to get data driven business solution.

If the frequency of a certain health condition in the population is 10%, what is the probability that among 10 patients a doctor will see no more than 2 patients with that condition? Whether or not a random person has the condition is like a coin flip, they either do or do not, with probability of having the condition ("success")

P = 0.1. Among n = 10 patients, the probability of 3 or fewer having the condition can be found using the cumulative distribution function


Part B

Context :

Company X manages the men's top professional basketball division of the American league system. The dataset contains information on all the teams that have participated in all the past tournaments. It has data about how many baskets each team scored, conceded, how many times they came within the first 2 positions, how many tournaments they have qualified, their best position in the past, etc.

DATA DESCRIPTION:-

Basketball.csv - The data set contains information on all the teams so far participated in all the past tournaments.

Context :

Company X manages the men's top professional basketball division of the American league system. The dataset contains information on all the teams that have participated in all the past tournaments. It has data about how many baskets each team scored, conceded, how many times they came within the first 2 positions, how many tournaments they have qualified, their best position in the past, etc.


DATA DESCRIPTION:-

Basketball.csv - The data set contains information on all the teams so far participated in all the past tournaments.

DATA DICTIONARY

  1. Team: Team’s name
  2. Tournament: Number of played tournaments.
  3. Score: Team’s score so far.
  4. PlayedGames: Games played by the team so far.
  5. WonGames: Games won by the team so far.
  6. DrawnGames: Games drawn by the team so far.
  7. LostGames: Games lost by the team so far.
  8. BasketScored: Basket scored by the team so far.
  9. BasketGiven: Basket scored against the team so far.
  10. TournamentChampion: How many times the team was a champion of the tournaments so far.
  11. Runner-up: How many times the team was a runners-up of the tournaments so far.
  12. TeamLaunch: Year the team was launched on professional basketball.
  13. HighestPositionHeld: Highest position held by the team amongst all the tournaments played.

PROJECT OBJECTIVE:

Company’s management wants to invest on proposals on managing some of the best teams in the league. The analytics department has been assigned with a task of creating a report on the performance shown by the teams. Some of the older teams are already in contract with competitors. Hence Company X wants to understand which teams they can approach which will be a deal win for them


Preparing the data for analysis

We found '-' value in some of the columns and it should be treated accordingly

We have removed the "-" from the data and imputed with 0

We found that Team Launch is not correct with values and will format the values with initial year as launchYear using below function

Above data df1 is now ready after cleaning and set for analysis


Exploratory Analysis for insights

Creating New columns for getting insight as

We are given that some of the older teams are already in contract with the competitors, so We will remove some old teams and get insights from remaining teams who can be dealt for future good performance

Above data shows the top teams in terms of maximum tournaments won with their age

3.Task

Suggestions to the association management on quality, quantity, variety, velocity, veracity etc. on the data points collected by the association to perform a better data analysis in future

Part C

Context :-

Company X is a EU online publisher focusing on the startups industry. The company specifically reports on the business related to technology news, analysis of emerging trends and profiling of new tech businesses and products. Their event i.e. Startup Battlefield is the world’s pre-eminent startup competition. Startup Battlefield features 15-30 top early stage startups pitching top judges in front of a vast live audience,present in person and online

DATA DESCRIPTION:-

CompanyX_EU.csv - Each row in the dataset is a Start-up company and the columns describe the company

DATA DICTIONARY:

  1. Startup: Name of the company
  2. Product: Actual product
  3. Funding: Funds raised by the company in USD
  4. Event: The event the company participated in
  5. Result: Described by Contestant, Finalist, Audience choice, Winner or Runner up
  6. OperatingState: Current status of the company, Operating ,Closed, Acquired or IPO

PROJECT OBJECTIVE:

Analyse the data of the various companies from the given dataset and perform the tasks that are specified in the below steps. Draw insights from the various attributes that are present in the dataset, plot distributions, state hypotheses and draw conclusions from the dataset


Data Exploration

Above information tells us that there are missing values for Funding Column ,Product Column

Data preprocessing & visualisation:


Converting ‘Funding’ features to a million attribute and into numerical value

Creating a boxplot for funds in million.

Checking the number of outliers present in Funds_in_million column using IQR

lets see the distribution after removing outliers

From above description , It is seen as 75% companies have raised less tha 5 million in funds and data has maximum value of 22 as well



Statistical Analysis

Null hypothesis (Ho) : No Difference between Funds raised by companies that are still operating vs companies that closed down

Alternate hypothesis (Ha) : Difference between Funds raised by companies that are still operating vs companies that closed down

Copy of the original data frame

Frequency distribution of Result variables

There are 332 Contestant that are operating and 19 winner that are operating, So we can combine others choices into winners



Percentage of winners that are still operating and percentage of contestants that are still operating

Null hyputhesis (Ho): The proportion of companies that are operating is the same for - WINNERS and CONTESTANTS

Alternative hypothesis (Ha): The proportion of companies that are operating is significantly different from each other, between WINNERS and CONTESTANTS

Since the p-value, 0.037 < 0.05 (alpha) the difference is significant and we reject the Null hypothesis and can say that
The proportion of companies that are operating is significantly different from each other, between WINNERS and CONTESTANTS


Conclusion:

Select only the Event that has ‘disrupt’ keyword from 2013 onwards